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Abstract 

We consider the problem of linear fitting of noisy data in the case of broad 
(say a-stable) distributions of random impacts (“noise”), which can lack 
even the hrst moment. This situation, common in statistical physics of small 
systems, in Earth sciences, in network science or in econophysics, does not 
allow for application of conventional Gaussian maximum-likelihood estima¬ 
tors resulting in usual least-squares hts. Such hts lead to large deviations of 
fitted parameters from their true values due to the presence of outliers. The 
approaches discussed here aim onto the minimization of the width of the dis¬ 
tribution of residua. The corresponding width of the distribution can either 
be dehned via the interquantile distance of the corresponding distributions or 
via the scale parameter in its characteristic function. The methods provide 
the robust regression even in the case of short samples with large outliers, 
and are equivalent to the normal least squares fit for the Gaussian noises. 
Our discussion is illustrated by numerical examples. 
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• Gorrect estimating of the linear £t parameters in a presence of large 
outliers 
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• The median of the empirical distribution of the residues determines 
line’s shift 

• The minimum of interquantile width determines line’s slope (1st method) 

• The maximum of characteristic function’s residues determines line’s 
slope (2nd method) 
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1. Introduction 

The method of least squares linear regression (straight line fitting) has a 
very long history: it was invented in its simplest form by C.F. GauB, but is 
still one of the most widespread and powerful approaches in data analysis. It 
may be used as a stand-alone tool to detect linear trends, or be incorporated 
into more cormlex analysis procedures, like Detrended Fluctuation Analysis 
proposed in [l|, whose first step requires subtraction of linear trends from 
subpartitions of data. The standard variant of the method assumes the linear 
relation between the dependent variable y and the independent one x, and 
the existence of a random impacts on the outcomes of single measurements, 
represented by the noise so that 

yi = axi + b + ( 1 ) 

and is aimed onto extracting information about a and b from such noisy 
data. The standard method works well if the data are “compact”, i.e. when 
the corresponding interval on the abscissa is homogeneously sampled and no 
large ordinate outliers are present. The method is essentially a paramet¬ 
ric one and can be regarded as the maximum likelihood approach assuming 
the Gaussian distribution of independent errors. The challenges of more 
complicated samples originating from modern problems of experimental and 
computational physics and related fields have motivated works aimed to im¬ 
prove the accuracy of fits to extremely irregular data, i.e. the ones having 
outliers on the ordinate and on the abscissa (leverage points), or large errors 
in locating Xi, see (2, for the list of modern modihcations. For this reason, 
a number of works discuss the criteria for a detection of this outliers with 
the following their elimination with respect to a prescribed cut-off level, and 
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the regression of obtained “cleared” samples or a choice of subintervals, 
where the influence of outliers could be negligible 0,0- Another problem 
arises for the non-independent noises which themselves can show trends 0. 

Even in the case of independent errors the problems arise if the noise pos¬ 
sess a heavy-tailed distribution, i.e. generates large outliers. These are quite 
characteristic for a large variety of process in small nonequilibrium systems, 
network dynamics, econophysics, etc. 0]. Since these distributions may lack 
even the hrst moment, their processing, if keeping the principles of the least- 
square regression untouched, requires very specihc methods 0, 101 including 
repeated median regression, the consideration of a nested hierarchy block 
subdivisions for the analyzed sample, etc. For such cases non-parametric 
regression methods may be superior to the standard one. 

In the present work we discuss two such approaches, the quantile re¬ 


gression as pioneered by Koenker and Basset [ll[, and the scale parameter 
regression based on the properties of characteristic functions. The methods 
are non-parametric (i.e. do not assume the specihc form of the distribution) 
and robust (i.e. do not rely on the existence of its moments). Our numeri¬ 
cal examples consider linear trend in presence independent errors distributed 
according to Levy stable laws. 

As a practical example, we consider geophysical data, namely the east¬ 
ward component of the geomagnetic held measured on a moving Antarctic ice 
shelf, showing a linear trend from the motion and a combination of small and 
large scale huctuations. Here the results of robust scale parameter regression 
are compared to conventional methods. 


2. Linear regression 

Before discussing the specihc methods, let us shortly review the general 
idea (or, better, general ideas) of linear regression. Posing the regression 
problem starts from the assumption that the values of the dependent variable 
(observable) yt linearly depend on a:*, but are subject to additive noise 
Eq.([T]). We are looking for the way of inferring of the parameters a and 6, 
delivering the best possible estimates a and b for these parameters. In the 
ideal situation (at least in the asymptotic setting when the total number of 
measurement points N gets large, N —)■ oo) the method should give a = a, 
b = b. In praxis, this is usually done by the application of the least squares 
ht. 

There are diherent ways to think about the least squares method. 
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First, we can follow the standard line of argumentation pertinent to sta¬ 
tistical inference and make a maximal likelihood estimate for the parameters 
a and b assuming the distribution of is Gaussian with zero mean and 
unknown dispersion, 

In this case the probability density of a given realization of is given by the 
product of such single-point distributions: 

^ / X -iV 

exp 

i=l 

Changing from to yi we get the corresponding density of the experimental 
outcomes {?/*}, 

f I n f \~^ I E*=i(l/* - aa;* - 6)2^ 

p[yi,...,yN\a,b) = [y2Tiaj exp I - — - I. 

Considering the log-likelihood of a and b provided the data, 

N (qj- — OT' — 

L{aM{y^}) = \np{{y,}\aM = const - ^ 

and maximizing it with respect to a and b, we get the least square prescription 
for hnding a and b by minimizing the sum of squared residues 

[yi — {axi Rby^ = min. 

i 

Note that this criterion, which essentially assumes the Gaussian prior is of 
course a parametric one, and therefore not robust. Assuming another distri¬ 
bution, say the Laplace one with 

will lead to a different criterion, in this case to the minimization problem of 

R = '^\yi- + ^)l• 
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Another approach to the linear regression is a geometric one. As above, 
the variables are assumed to be i.i.d. random variables drawn from a 
distribution which we will be assumed continuous, symmetric and 

monomodah The coordinates of points are mutually independent. 

The points {xi, are considered as realizations of points in a two-dimensional 
cloud characterized by the density (joint probability density) p{x,^) = p{x)p{^) 
This cloud is mirror-symmetric with respect to x axis. The pairs {xi, Pi) with 
Pi depending on Xi are realizations of points of another two-dimensional cloud, 
which is obtained from the first one by a shift and an affine transformation. 
The regression aims on the restoration of these transformation parameters a 
and b so that the cloud with the density p{x,^) with ^ = p — {ax + b) indeed 
has the properties discussed above. One looks for the empirical estimators d 
and b of these parameters. 

If we say that this symmetry presumes the fact that the center of mass of 
the cloud lays on x axis and then that one of its main axes of inertia coincide 
with it, we get from the hrst requirement 

'^Pi- {axi + b) = 0 
i 

so that b = N~^ Yhihji ~ = {v) ~ a{x). Then one notes that the main 

axes of inertia of the two-dimensional body are such that the moments of 
inertia with respect to these are extremal, and requires the extremality of 

^ = “ (y) ~ “ (^))]^ 

i i 

(with I being the moment of inertia with respect to the x-axis) with respect to 
a with b dehned as before. This gives equations which dehne a and b from the 
least square method. However, the mirror-symmetry of the {x, .^)-cloud with 
respect to x-axes can be cast into different other extremality prescriptions or 
into the statement that half of its mass has to lay above, and half below the 
axis, which gives (provided b is defined) the robust median regression for a. 
The method should work in this form provided (p) and (x) do exist. If they 
do not (i.e. when the distribution of p is broad (outliers) or the distribution 
of X is broad (leverage points)), the standard problems arise. Note that the 
median method is sensitive to the centering of the cloud: it will break down 
if the center of the cloud is at the origin. 

Another variant of the geometric approach discussed below is based on 
a different consideration. It aims on hnding the estimate for a prior to 
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connecting it to b and is robust both with respect to outliers and to leverage 
points (which question is not a topic of the present work). 

Let us dehne the residues 


Ayi = (a - a)xi + {b - b) + 

and concentrate hrst on the obtaining of the best estimate a for the slope 
parameter a. We note that the parameter b only shifts the distribution of 
Ayi, and only influences the position of Ay^, while the parameter a influences 
the width of p{Ayi). In the case of exact tuning a = a this width is given by 
the one of the distribution of for a 7 ^ a the distribution of Ay^ (centered 
on 6 — 6 ) is a convolution of the distribution of ^ and the one of (a — a)xi, 
which now has a nonzero width. Since the convolution of two distributions 
is always ’’broader” than each of them, the minimal width will coincide with 
the one of the distribution of ^ and achieved for a = a. In a setting when the 
width of the distribution is given by its variance, the method again reduces 
to the least squares approximation: The empirical width is dehned as 

i 

and is minimized with respect to two free parameters a and b. 

3. Width regression 

In our approach we use the fact that while the parameter b only shifts 
the distribution of Ay^, and influences the position of Ayi, the width of the 
distribution of Ay^ is only influenced by the parameter a. In the case of 
exact tuning a = a this width is given by the one of the distribution of 
Our two regression approaches differ in the point of how this “width” of the 
distribution is dehned. 

As we have already seen, dehning the width by a variance of the corre¬ 
sponding distribution (provided it exists) leads to the standard least square 
prescription; its additional advantage is that the minimization procedure fol¬ 
lows by solution of a system of linear algebraic equations. Other dehnitions 
of width (for example estimation the hrst absolute moment of the distri¬ 
bution) lead to nonlinear equations which have to be solved numerically. 
Both methods estimate width via some absolute moments of the distribu¬ 
tion. Both methods do not work for distributions having power-law tails; the 
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first one fails for the ones with diverging second moment, the second one for 
the distributions with diverging hrst moment (like Cauchy distribution). 

Moments do not represent robust statistics since they do not exist for all 
distributions. The robust statistics is given by such measures of width which 
exist for all distributions of y and of x. There are several classes of such robust 
measures either pertinent to the distribution itself, say, its quantiles, or to 
its characteristic function, say its scale parameter. These two possibilities 
will be discussed in the forthcoming sections. In all our discussions we will 
only concentrate on outliers, and both in our numerical examples and in the 
practical one Xi are homogeneously distributed within a hnite interval. 

3.1. The interquantile distance regression 

3.1.1. Description of the method 

One of the robust estimates of width is given by interquantile distance 
of the corresponding distribution (since the cumulative distribution function 
(c.d.f.) and therefore the quantiles do exist for any proper PDF). 

The practical realization for a given set of data {xj}, j = 1..N is subdi¬ 
vided into two steps. Since the width of c.d.f. is invariant with respect to 
shifts, at the first step we consider the series 

y] = Vj - 

and its c.d./.’s C{ai) for the trial slopes a* equidistantly sampled with the step 
ha within some interval. We moreover hx some quantiles q and 1 — q dehning 
the width to be minimized (in the following examples we set g = 1/4). As 
it has been discussed above, the minimal half-width of C{a) corresponds to 
the best £t of a* = a. 

For each Oj, the obtained set of values ?/* is sorted in ascending order to 
whence the desired c.d.f. half-width is simply 

//U-(C(a.)) = y;3„^4,(2) 

A search of the minimum for the series ([2]) provides the index of desirable 
value tti = a. Here the square brackets denote an integer part of the fractions. 
Having obtained a, one can obtain the shift parameter b as the median of 
the distribution of yj — dxi. However, it should be pointed out that it might 
be preferable to obtain the value of b via the equidistant trials for which 
the c.d.f. of the series 

yj = yj - dXi - bi 
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has a median equal to zero instead of the single-run median search. This is 
the case for non-equispaced samples, since the algorithms for identifying the 
zero crossing provide better accuracy due to the possibility of interpolation. 

Practically, due to sample’s discreteness, we use the criterion of minimum 
for where y* is again the series of ?/* sorted in ascending order. Thus 

both htted parameters, a and b are determined. 

Although in our simple realization of the method we mostly obtain the 
quantiles by simply counting the points, we note that this can be done in a 


more elegant way using the quantile regression methods as pioneered in [11 


see 


1^ for the state of the art discussion). This general approach can be 


cast into the minimization problem, namely, in solving 


a = argmin^gjj ^ q\yi- 


axj 


(l-g)|l/i- 


axjl. 


( 3 ) 


2 = 1 ; yj>axi 


2=1; yjCiaXi 


where 0 < g < 1 is the regression quantile sought for. Formally, the method 
requires the existence of the hrst moment of the ^/-distribution, and may 
lead to instabilities when applied to situations with large outliers, although 
we never encountered them is our test runs. 


3.1.2. Maximizing sensitivity 

It should be pointed out that although the approach works for an arbi¬ 
trary part of c.d.fs width, the important question is, what quantile has to 
be chosen to provide the largest local sensitivity of the method. 

Let us at the beginning consider a centered distribution and take b—b = 0. 
Let us denote Aa = d — a. The distribution of centered y is a. convolution of 
the distributions of (a — a)x and of since are independent on x. 

For X homogeneously distributed on the interval [—lF/2,kF/2] the con¬ 
volution p{y) of the corresponding distributions can be expressed via the 
cumulative distribution function C{x) = namely 


p(y) 


1 

WAa 


C 



WAa\ 

) 


-C 



WAaY 

I ■ 


( 4 ) 


For Aa very large the distribution tends to a rectangular of width WAa, so 
that its interquantile distance (for given quantiles of index q and 1 — q) is 
linear in Aa. For Aa small the dependence of interquantile distance on Aa 
gets quadratic. 










Let us discuss this situation by expanding the cumulative functions C in 
Eq. dl]) in Taylor series around y. Since all even terms vanish, only the terms 
linear and cubic in lEAa/2 survive in the lowest orders, so that 


p{y) = 


1 


W^a 


C'{y)W^a + C"'{y) 


W^Aa^ 


= p{y) + ^^^W^Aa^. 


The position Qg of the g-th quantile is given by 


( 5 ) 


’*Qq 


p{y)dy = q- ( 6 ) 

and performing the integration 


Inserting the expression Eq.(|5]) into Eq.l 
we get Qg as the solution of the equation 

W^Aa^ , 

+ -^p'iQ,) = q- 

We note that for Aa = 0 the solution of C{Qg) = q gives exactly the quantile 
of the distribution of the noise, so that 

C{Qq) -q = p{Qg)AQg 

is proportional to the shift of this quantile when detuning a. The highest 
sensitivity is attained when the largest absolute shift |A(5q| for given Aa is 
observed. Since 

W^p'iQg 


AQg = 


-Aa^, 


3 p{Qg) 

this takes place when q is chosen such that the absolute value of the loga- 

P'iQq) 


rithmic derivative 


PiQq 


= max 


is attained at the point Qg of the error distribution. For example for the 
Cauchy distribution this are exactly the lower and the upper quartiles of the 
distribution. 

For a Gaussian distribution, for which the logarithmic derivative equals 
to Qg the absolute relative change in the quantile 

AQg 


Qq 


W\ 2 
= —Aa, 


doesn’t depend on the index. However, it should be kept in mind in practical 
applications that the chosen quantile must contain a sufficient number of 
points. 
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3.1.3. Numerical example 

Let us consider the signal y = ax+fe+i^, where ^ is a random variable with 
the symmetric null-centered a-stable density with the characteristic function 


0 (a;) = exp(- 7 “|a;|“), 


(7) 


where a G (0, 2] is the characteristic exponent and 7 > 0 is the scale pa¬ 
rameter. Note that for a < 2 the second moment is absent, therefore the 
dispersion-based methods are inapplicable, and for a G (0, 1] even the mean 
value diverges, thus one can not apply the approaches calculating the abso¬ 
lute values of deviations. 

Fig. [1] demonstrates an example of the fitting for the function y = 0.5x -|- 
0.2 corrupted by the white Levy noise with a = 1 (Cauchy distribution) 
with the scale parameter 7 = 5, i.e. with a quite large outliers, over the time 
interval t G [0, 100] sampled with the unit step. The random numbers are 
generated by the routine stblrnd [l^ based on the methods presented in 14, 
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The sample is processed by the written MATLAB routine with the step 
size 0.001 for both a taken from the interval [ 0 , 1 ] and b taken from [— 10 , 10 ]. 
The obtained pair (a, b) = (0.515, 0.149), while the conventional least squares 
method of linear £t provides sufficiently worse values (0.664, —10.807). 

Fig. m demonstrates the behavior of the basic statistics of the method, the 
half-width of the cumulative distribution function. It is naturally irregular 
since a single random realization is processed. However, the global minimum 
is clearly visible even on the background of multiple small local ones. Note 
that the presence of local minima might be a problem if the global one is 
shallow, as it happens in the example of Sec. 01 Therefore it is always 
advisable to plot the curve like in Fig. |2]to be able to estimate the possible 
uncertainties caused by this effect. In the case when such uncertainties are 
large it is better to resort to the method described in the next section. 

Fig. El shows the behavior of scaled half-width of the c.d.f. for different 
characteristic exponents and scale factors, i.e. the a-dependence of HW/'y. 
The curves are results of ensemble averaging over 10000 realizations. One 
can see that they all monotonically decrease when the distribution of errors 
tends to the normal distribution and have a universal shape (the deviations 
are within the error of averaging). This fact follows from the self-similarity of 
the distributions ([7]) since their arguments depend on the combination x /7 if 
the scale parameter is defined as in Eq.([7]). Additionally, this picture shows 
that although for different a these are the quantiles with different indices 
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Figure 1: The initial deterministic process (thin solid line, almost invisible because it is 
overlapped by the fit line), its sample with the added Levy noise (circles) and the results 
of fitting by the proposed method (thick solid line) and by the conventional least squares 
method (dashed line). 



Figure 2: The half-width of the cumulative distribution function for the samples Xj — aiXj 
as a function of the trial slopes for the data shown in the Fig. [TJ 
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Figure 3: The dependence of the minimal width for the cumulative distribution functions 
normed by the scale coefficients for the various characteristic exponents a and scales 
7 = 0.5 (diamonds connected by the solid line (blue in color online)), 7 = 1 (circles 
connected by the dashed line (black in color online)) and 7 = 5 (asterisks connected by 
the dash-dotted line (red in color online)). 


which are most sensitive to the deviation of a from its best valne a, hxing 
interqnantile distance (the half-width of c.d.fin onr case) as a test statistics 
practically provides a nniform qnality of slope’s determination. 

3.2. Scale parameter regression 

Another method is based on the estimating width of the distribntion via 
its characteristic fnnction f{k) = {exp{iky)), which is also an object which 
does exist for any proper distribntion. 

Since the distribntion of centered y is a convolntion of the distribntions 
of (a — a)x and of its characteristic fnnction fy{k) is the prodnct of the 
characteristic fnnctions of the distribntions of f^{,k), and of Aax, being 
fAa{k) = f exp(ikAax)p(x)dx = fx(kAa): 

fyi^) = fd^)fx{kAa). 

For example, for symmetric Levy noise with scale parameter 7 and homoge- 
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neous distribution of x on {—W/2, W/2) we get 


fyik) = 




exp(- 7 "|fc|") 


sm{W Aak/2) 
WAak/2 


1 -7"|fc|" 


W^Aa^ 

3 


k^ + ... 


( 8 ) 


(where the prefactor of k‘^ is simply the dispersion of the distribution of x). 
Thus, hxing some k (small enough so that the asymptotic expansion close to 
/c = 0 still works for both distributions of x and of we can look for the 
maximum in a of fy{k) which is attained exactly at Aa = 0 . 

Note that for Gaussian distribution of ^ Eq.dH]) for small k reduces to 


fy{k)^l-{^^ + ^lAa^)k\ 


describing a centered distribution with the total dispersion 


2 2 I 2 A 2 

Itot = 7 + 7x^a , 


so that minimizing the total width using the small-/c approach reduces to the 
minimizing of the dispersion of 1/7 its approximation by an empirical estima¬ 
tor leads to the least squares method. The local sensitivity of the method is 
always given by 7 ^/c^ so that it can be influenced by a judicious choice of k 
which has to be small enough to allow using the quadratic approximation (it 
depends e.g. on the higher moments of the x-distribution) but not too small 
to make the sensitivity too low. 

This appropriate value of k for an arbitrary a can be determined by 
the following reasoning. The function sin(iyAa/c/2)/(lTAa/c/2) in the hrst 
line of Eq. (0) is an oscillating function whose two roots closest to the global 
maximum at fc = 0 are located in 


Therefore, if the value of a can be restricted to a G [—a^ax, o^max] by inspec¬ 
tion, the frequency parameter can by taken as fc = 7 r(amaxhE)“h This results 
in the location of the main maximum within the prescribed interval only. 

Therefore, the operational idea of the method is to calculate the empirical 
characteristic function 


f{k\a) 


1 

N 


N 

^exp [ik{yj 


axj)] 


(9) 
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as an approximation for fy{k) for given a and consider its dependence on a 
for a fixed k within the range described above. 

The shift parameter b is omitted in Eq. ([9]) since it only introduces the 
phase multiplier 

fik\a) ^ 

which can be eliminated by considering 

(p{k,a) = \ f{k\a)\ 

(or alternatively by centering in real space). 

Fig. m demonstrates the example of the behavior for the function f{k\a) 
calculated for a single realization of the same linear function corrupted by 
Levy noise as in the Subsection 3.2 with the same spacing of the trial pa¬ 
rameter a. One can clearly see the maximum sought for, which allows to 
determine d = 0.504, a better estimate than the one obtained by the method 
of the previous section. Moreover, the curve is much smoother in compar¬ 
ison with Fig. [2] which allows to avoid false extrema. The still undefined 
parameter b can be determined using the median regression of detrended 
data Hi — dxi as described above, since in the present approach it only enters 
the phase shift and can only be dehned modulo 27i. 

3.3. Comparison of the methods 

Let us compare the efficiency of two proposed methods, primarily in de¬ 
termination of the line’s slope. Since individual realizations, especially in the 
case of small a, have a considerable variability, we performed the calculations 
for an ensemble of 1000 individual realizations (with the parameters given 
above), each of them fitted separately. Fig. [5] presents the resulting aver¬ 
age values of the slope and its root-mean-square deviations from the exact 
value a = 0.5. Fig. [B] shows a similar comparison for a fixed sample length 
(L = 200) but for different indices a of the noise’s distribution. 

One can see that both methods provide more than reasonable fitting 
even for very short samples. The method based on the characteristic func¬ 
tion is more accurate for shortest samples that can be explained by the 27t- 
periodicity of the random phase: since large outliers originated from Levy 
noise are relatively rare, their influence in the vicinity of the main frequency 
maximum is small for short samples while their presence in boundary quar- 
tiles strongly influences the half-width of c. d.f. For larger samples, the equiv¬ 
alent outliers and mod 27t result in larger errors in comparison with the 
results provided by the interquantile distance method. 
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Figure 4: The dependence of the characteristic function on the trial slopes around the 
main maximum. The parameters of the regular and noise components are the same as in 
Fig.H 



Figure 5: Upper panel: the ensemble averaged value of the slope determined via the quan¬ 
tile distribution width method (circles connected by solid lines) and via the characteristic 
function regression (asterisks connected by dashed lines) for various sample lengths. Lower 
panel: the root-mean-square deviations from the exact value for both methods. The pa¬ 
rameters of the regular and noise components are the same in Fig. [2] 
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Figure 6: Upper panel: the ensemble averaged value of the slope determined via the quan¬ 
tile distribution width method (circles connected by solid lines) and via the characteristic 
function regression (asterisks connected by dashed lines) for various indices of the noise 
distribution. Lower panel: the root-mean-square deviations from the exact value for both 
methods. The parameters of the regular and noise components are the same in Fig. [5] 

Let us turn to the a-dependence. Two methods perform slightly differ¬ 
ently at small a (Fig. [6]), otherwise reproducing the corresponding values 
very accurately. The root-mean-square deviation of a from the exact value 
is a monotonically decaying function of the Levy index for the characteristic 
function method. For the interquartile method it has a minimum around 
a = 1: this fact reflects various sensitivity of the method for different a; 
taking quartiles produces maximal sensitivity exactly for a = 1. 

4. Practical example 

As a practical test we process geomagnetic held data measured by a hux- 
gate magnetometer located at Halley, Antarctica on the Brunt ice shelf. Such 
data are known to be complex comprising regular oscillations, highly irregu¬ 
lar short bursts, and a linear trend originating from the ice shelf displacement 
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Figure 7: Plot of the eastward component of the geomagnetic field D at Halley, Antarctica 
measured at X-min resolution from January 26 to December 28, 1998, after [1^ (courtesy 
of M.P. Freeman, British Antarctic Survey) and the linear trend line with the coefficients 
determined via the scale parameter regression method. The time scale: seconds since 
January 1, 1998. 
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It should be pointed out that the de-trendin g of such data is one of the 


key problems of ice shelf-based data processing [17|]. Fig. [7] demonstrates 


the example of such data, the small-scale processing of which has been dis¬ 
cussed in the work [l^. Its authors highlighted the necessity of an additional 
median excluding even for very short portions of the data de-trended by a 
conventional method due to a presence of large outliers. The feature which 
makes this practical example different from our previous numerical ones is 
the correlated nature of the noise. However, one readily infers that the cor¬ 
relation time is short compared to the total measurement time, so that the 
methods should presumably work. 

The parameters of the noise were estimated as follows: the data were de¬ 
trended by the least square £t, and then the routine stblf it 13(] was applied 
to check whether the de-trended distribution belongs to the class of alpha- 
stable ones. The process rapidly converges to the following parameters: the 
characteristic exponent a = 1.39381, the skewness /3 = —0.0695959, the scale 
parameter 7 = 11.8844 and the location 5 = —2.33173. Thus, one can as¬ 
sume to a good approximation that (up to the correlated nature of the noise) 
the situation belongs to the class described above: the practically symmetric 
Levy noise. The nonzero location parameter appears due to inconsistencies 
in determination of the shift parameter by the usual least square approach 
as discussed below. 

The estimates for a and b given by the least mean square (LMS) regression 
and by the scale parameter regression (SPR) are (a, b) = (8.055 • 10“®, —45.8) 
and (8.006 ■ 10“®, —46.4) respectively. The results of the quantile regression 
(QR) for different quantiles are given in Tabled! 


Table 1: The linear fit parameters for different interquantile distances. 


Quantile interval 
[0.25 - 0.75] 
[0.30 - 0.70] 
[0.40 - 0.60] 
[0.475 - 0.525] 


Parameters (a, b) 
(8.237-10-^-50.1) 
(8.126- 10-®,-48.3) 
(8.257-10-6,-50.0) 
(8.161 - 10-6,-48.9) 


One readily infers that the results of application of LMS and SPR pro¬ 
cedures are quite similar, while the results of QR are stable with respect 
to the choice of the quantile, but overestimate the slope compared to the 
previous two methods. This fact can be traced back to the local irregularity 
of interquantile distance curve in the vicinity of a quite flat global minimum 
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Figure 8: The intequantile distance between Q0.7 and Q0.3 as a function of the trial slopes 
tti for the data shown in the Fig. [3 


(see Fig. |H]), whose flatness is partly due to the relatively large value of a. 
Therefore, the fact that the interquantile regression, which on the average 
might perform better than SPR for very long-tailed distributions in the case 
of small samples (compare with the results in Fig. [6]), does not warrant for 
better performance in a single run for more regular noises and large samples. 

Let us now concentrate on the comparison of SPR and LMS procedures 
as applied to the subdivisions of the whole sample and to the sample with 
excluded outliers, with the goal to compare new and conventional approach. 
The parameters of linear fits for a set of intervals obtained via the subdivision 
the initial time interval into two and four parts are presented in Table [2j One 
readily infers that the relative variation of slopes does not exceed 9% for the 
scale parameter regression in contrast to more then 20% for the conventional 
least squares fit. The latter results even in more irregular behavior of the 
shift parameter: for subdivision into four intervals it varies by a factor of 3 
compared to merely 20% as given by the robust method. Therefore, although 
large outliers influence the fitting results for both methods, the scale param- 


19 



Table 2: The comparison of linear fit parameters (fc, b) obtained by two methods (Scale 
Parameter Regression - SPR and Least Mean Square Regression - LMS) for the subdivided 
time intervals expressed as a ratio to the whole interval taken as a unit. 


Subinterval 


SPR 



LMS 


10.1] 

(8.006 

10- 

-6 

-46.4) 

(8.055 

lo-^ 

-45.8) 

|0.1/2] 

(8.121 

10- 

-6 

-48.2) 

(8.222 ■ 

lo-^ 

-47.5) 

[1/2,1] 

(7.731 

10- 

-6 

5 

-40.0) 

(7.690 

■ 10®, 

-36.9) 

10.1/4] 

(7.791 

10- 

-6 

-47.8) 

(7.804 

lo-^ 

-46.5) 

11/4,1/2] 

(7.652 

10- 

-6 

-41.0) 

(6.424 

10-6, 

-23.0) 

[1/2,3/4] 

(7.985 

10- 

-6 

-45.0) 

(8.556 ■ 

10-6, 

-54.2) 

|3/4.1] 

(7.278 

10- 

-6 

-40.0) 

(7.016 • 

10-6, 

-18.5) 


Table 3: The comparison of linear fit parameters (fc, b) obtained by two methods (Scale 
Parameter Regression - SPR and Least Mean Square Regression - LMS) after elimination 
of outliers. 

Cutoff level SPR LMS 

100% (8d)06^T(r^7^46l) (8.055 • 10"^ -45.8) 

25% (8.007 ■ 10"®, -46.4) (8.034 • 10"®, -46.2) 

1% (8.008 ■ 10-^-46.5) (8.007-10-^-46.8) 

eter regression allows for determination of the basic physical effect (speed of 
ice motion, which is a constant directly determining the trend’s slope) more 
accurately. 

As the second test for comparison with standard approach to the process¬ 
ing of data with large outliers, we discuss linear fitting of the same sample 
with excluded outliers. At the first step, we de-trended the data by the mean 
least square fit, stated the cutoff level, above which the points were excluded, 
and finally processed initial sample without the excluded points again. 

Table [3] shows the results of processing of regularized data in comparison 
with the original ones. As it should be, the exclusion of points, whose devia¬ 
tion exceed 1% of the maximal detected value, results in the equal (within a 
prescribed accuracy) coefficients of the linear £t. While the results for SPR 
practically do not change when adding the points with larger deviations, the 
ones of LMS show a considerable trend. 
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5. Conclusions 

The results of this work can be summarized as follows. We have discussed 
two methods for the robust linear fit to noisy signals, which can be applied 
to the case when the lower moments for the noise probability distribution 
diverge, e.g. for Levy noises. Both are based on the idea that the width of 
the distribution of the residues is the smallest when the slope of the regression 
line is chosen correctly, and differ in how this width is defined. 

The first method is the quantile regression approach. The second method 
deals with its counterpart in frequency domain, i.e. with the maximization 
of the trial characteristic function. Both approaches demonstrate their ro¬ 
bustness and high accuracy for the noise distributions with extremely large 
outliers and may be used for a wide range of applications, for which such 
a behavior is characteristic, e.g. in problems of plasma dynamics, econo¬ 
physics, etc. As a practical test we apply the methods to the data of the 
geomagnetic field measurements by a detector placed on an Antarctic ice 
shelf, showing large irregularity, and compare their performance to the one 
of standard approaches. In this case the scale parameter regression seems to 
perform the best. 
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